Vae-space: Deep Generative Model of Voice Fundamental Frequency Contours
نویسندگان
چکیده
Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F0 contour into the sum of phrase and accent components of the Fujisaki model, a mathematical model describing the control mechanism of vocal fold vibration, without an iterative algorithm, and 2) can represent/generate F0 contours of both normal speech and singing voices reasonably well.
منابع مشابه
Concept Formation and Dynamics of Repeated Inference in Deep Generative Models
Deep generative models are reported to be useful in broad applications including image generation. Repeated inference between data space and latent space in these models can denoise cluttered images and improve the quality of inferred results. However, previous studies only qualitatively evaluated image outputs in data space, and the mechanism behind the inference has not been investigated. The...
متن کاملStatistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework
We have previously proposed a statistical fundamental frequency (F0) prediction method that makes it possible to predict the underlying F0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F0 contour was still unnatural compared with that in normal speech. O...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملAdversarial examples for generative models
We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model...
متن کاملGenerative modeling of speech F0 contours
This paper introduces our ongoing work on generative modeling of speech fundamental frequency (F0) contours for estimating prosodic features from raw speech data. The present F0 contour model is formulated by translating the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration, into a probabilistic model described as a discrete-time stocha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017